class: center, middle, inverse, title-slide # Introduction to R for Data Analysis ## Outlook ### Johannes Breuer & Stefan Jünger ### 2021-08-06 --- layout: true --- ## Recap: Course schedule - Day 1 <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Time </th> <th style="text-align:left;"> Topic </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: gray !important;"> Monday </td> <td style="text-align:left;color: gray !important;"> 10:30 - 11:30 </td> <td style="text-align:left;font-weight: bold;"> Getting Started with R and RStudio </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> Monday </td> <td style="text-align:left;color: gray !important;color: gray !important;"> 11:30 - 11:45 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> Monday </td> <td style="text-align:left;color: gray !important;"> 11:45 - 12:45 </td> <td style="text-align:left;font-weight: bold;"> Getting Started with R and RStudio </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> Monday </td> <td style="text-align:left;color: gray !important;color: gray !important;"> 12:45 - 13:45 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Lunch Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> Monday </td> <td style="text-align:left;color: gray !important;"> 13:45 - 15:00 </td> <td style="text-align:left;font-weight: bold;"> Data Import & Export </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> Monday </td> <td style="text-align:left;color: gray !important;color: gray !important;"> 15:00 - 15:15 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> Monday </td> <td style="text-align:left;color: gray !important;"> 15:15 - 16:30 </td> <td style="text-align:left;font-weight: bold;"> Data Import & Export </td> </tr> </tbody> </table> --- ## Recap: Course schedule - Day 2 <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Time </th> <th style="text-align:left;"> Topic </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: gray !important;"> Tuesday </td> <td style="text-align:left;color: gray !important;"> 10:00 - 11:15 </td> <td style="text-align:left;font-weight: bold;"> Data Wrangling - Basics </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> Tuesday </td> <td style="text-align:left;color: gray !important;color: gray !important;"> 11:15 - 11:30 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> Tuesday </td> <td style="text-align:left;color: gray !important;"> 11:30 - 12:45 </td> <td style="text-align:left;font-weight: bold;"> Data Wrangling - Basics </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> Tuesday </td> <td style="text-align:left;color: gray !important;color: gray !important;"> 12:45 - 13:45 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Lunch Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> Tuesday </td> <td style="text-align:left;color: gray !important;"> 13:45 - 15:00 </td> <td style="text-align:left;font-weight: bold;"> Data Wrangling - Advanced </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> Tuesday </td> <td style="text-align:left;color: gray !important;color: gray !important;"> 15:00 - 15:15 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> Tuesday </td> <td style="text-align:left;color: gray !important;"> 15:15 - 16:30 </td> <td style="text-align:left;font-weight: bold;"> Data Wrangling - Advanced </td> </tr> </tbody> </table> --- ## Recap: Course schedule - Day 3 <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Time </th> <th style="text-align:left;"> Topic </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: gray !important;"> Wednesday </td> <td style="text-align:left;color: gray !important;"> 10:00 - 11:15 </td> <td style="text-align:left;font-weight: bold;"> Exploratory Data Analysis </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> Wednesday </td> <td style="text-align:left;color: gray !important;color: gray !important;"> 11:15 - 11:30 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> Wednesday </td> <td style="text-align:left;color: gray !important;"> 11:30 - 12:45 </td> <td style="text-align:left;font-weight: bold;"> Exploratory Data Analysis </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> Wednesday </td> <td style="text-align:left;color: gray !important;color: gray !important;"> 12:45 - 13:45 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Lunch Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> Wednesday </td> <td style="text-align:left;color: gray !important;"> 13:45 - 15:00 </td> <td style="text-align:left;font-weight: bold;"> Data Visualization - Part 1 </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> Wednesday </td> <td style="text-align:left;color: gray !important;color: gray !important;"> 15:00 - 15:15 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> Wednesday </td> <td style="text-align:left;color: gray !important;"> 15:15 - 16:30 </td> <td style="text-align:left;font-weight: bold;"> Data Visualization - Part 1 </td> </tr> </tbody> </table> --- ## Recap: Course schedule - Day 4 <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Time </th> <th style="text-align:left;"> Topic </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: gray !important;"> Thursday </td> <td style="text-align:left;color: gray !important;"> 10:00 - 11:15 </td> <td style="text-align:left;font-weight: bold;"> Confirmatory Data Analysis </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> Thursday </td> <td style="text-align:left;color: gray !important;color: gray !important;"> 11:15 - 11:30 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> Thursday </td> <td style="text-align:left;color: gray !important;"> 11:30 - 12:45 </td> <td style="text-align:left;font-weight: bold;"> Confirmatory Data Analysis </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> Thursday </td> <td style="text-align:left;color: gray !important;color: gray !important;"> 12:45 - 13:45 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Lunch Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> Thursday </td> <td style="text-align:left;color: gray !important;"> 13:45 - 15:00 </td> <td style="text-align:left;font-weight: bold;"> Data Visualization - Part 2 </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> Thursday </td> <td style="text-align:left;color: gray !important;color: gray !important;"> 15:00 - 15:15 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> Thursday </td> <td style="text-align:left;color: gray !important;"> 15:15 - 16:30 </td> <td style="text-align:left;font-weight: bold;"> Data Visualization - Part 2 </td> </tr> </tbody> </table> --- ## Recap: Course schedule - Day 5 <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Time </th> <th style="text-align:left;"> Topic </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: gray !important;"> Friday </td> <td style="text-align:left;color: gray !important;"> 10:00 - 11:15 </td> <td style="text-align:left;font-weight: bold;"> Reporting with R Markdown </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> Friday </td> <td style="text-align:left;color: gray !important;color: gray !important;"> 11:15 - 11:30 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> Friday </td> <td style="text-align:left;color: gray !important;"> 11:30 - 12:45 </td> <td style="text-align:left;font-weight: bold;"> Reporting with R Markdown </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> Friday </td> <td style="text-align:left;color: gray !important;color: gray !important;"> 12:45 - 13:45 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Lunch Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> Friday </td> <td style="text-align:left;color: gray !important;"> 13:45 - 15:00 </td> <td style="text-align:left;font-weight: bold;"> Advanced Use of R, Outlook, Q&A </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> Friday </td> <td style="text-align:left;color: gray !important;color: gray !important;"> 15:00 - 15:15 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> Friday </td> <td style="text-align:left;color: gray !important;"> 15:15 - 16:30 </td> <td style="text-align:left;font-weight: bold;"> Advanced Use of R, Outlook, Q&A </td> </tr> </tbody> </table> --- ## Where to go from here? Hopefully, after this week, you feel prepared to do your next steps in `R`. Some recommendations for continuing your jou`R`ney: - Keep up working with `R`! - If time permits, do stuff you usually do in `SPSS` or `Stata` in `R`, even when it's harder - Try to do at least one research task solely in `R` (one analysis, a whole paper, a report, etc.) - Look for tutorials and guides online - trust us, there's way more (good & free) online material for `R` than there is, e.g., for `SPSS` or `Stata` --- ## You voted for: Multilevel models Multilevel/Hierarchical or Mixed Regression Models (let's stick with MLM) have been quite popular for some time. They allow the incorporation of - random intercepts - random slopes Simply put: regression coefficients that vary across groups. --- ## MLM in `R` Of course, you can estimate such models in `R` as well. There are packages for that: - [`lme4`](https://cran.r-project.org/web/packages/lme4/index.html) may the most popular one - provides functions to estimate linear models (`lmer()`) or generalized linear models (`glmer()`) - ... - but there are others, e.g., for ordinal responses like the `clmm2()` function from the [`ordinal`](https://cran.r-project.org/web/packages/ordinal/index.html) package --- ## Toy example Let's start with loading the data and creating fictional clusters. ```r # load data gp_covid <- readRDS("./data/corona_survey.rds") # simulate clusters gp_covid <- gp_covid %>% mutate( clusters = sample(1:50, n(), replace = TRUE) ) table(gp_covid$clusters) ``` ``` ## ## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 ## 75 82 82 87 69 66 91 67 90 68 67 73 76 75 64 81 79 70 69 68 68 72 ## 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 ## 67 60 83 80 64 59 70 75 93 73 71 93 84 90 78 65 75 78 81 70 73 75 ## 45 46 47 48 49 50 ## 75 69 94 74 68 89 ``` --- ## Re-running models from the course as MLM: random intercept ```r library(lme4) mlm <- lmer( risk_self ~ 1 + left_right + (1 | clusters), data = gp_covid ) ``` --- ## Standard Output .small[ ```r summary(mlm) ``` ``` ## Linear mixed model fit by REML ['lmerMod'] ## Formula: risk_self ~ 1 + left_right + (1 | clusters) ## Data: gp_covid ## ## REML criterion at convergence: 10286.5 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -2.44597 -0.85935 -0.07264 0.71647 2.30069 ## ## Random effects: ## Groups Name Variance Std.Dev. ## clusters (Intercept) 0.000 0.000 ## Residual 1.606 1.267 ## Number of obs: 3103, groups: clusters, 50 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) 4.084441 0.060923 67.043 ## left_right 0.001522 0.012146 0.125 ## ## Correlation of Fixed Effects: ## (Intr) ## left_right -0.928 ## optimizer (nloptwrap) convergence code: 0 (OK) ## boundary (singular) fit: see ?isSingular ``` ] --- ## Output using the `parameters` package .small[ ```r library(parameters) model_parameters(mlm) ``` ``` ## # Fixed Effects ## ## Parameter | Coefficient | SE | 95% CI | t(3099) | p ## ------------------------------------------------------------------- ## (Intercept) | 4.08 | 0.06 | [ 3.97, 4.20] | 67.04 | < .001 ## left_right | 1.52e-03 | 0.01 | [-0.02, 0.03] | 0.13 | 0.900 ## ## # Random Effects ## ## Parameter | Coefficient ## -------------------------------------- ## SD (Intercept: clusters) | 0.00 ## SD (Residual) | 1.13 ``` ] --- ## Adding a random slope ```r library(lme4) mlm_2 <- lmer( risk_self ~ 1 + left_right + (1 + left_right | clusters), data = gp_covid ) ``` --- ## Output using the `parameters` package .small[ ```r model_parameters(mlm_2) ``` ``` ## # Fixed Effects ## ## Parameter | Coefficient | SE | 95% CI | t(3097) | p ## ------------------------------------------------------------------- ## (Intercept) | 4.08 | 0.06 | [ 3.96, 4.21] | 64.98 | < .001 ## left_right | 1.57e-03 | 0.01 | [-0.02, 0.03] | 0.13 | 0.899 ## ## # Random Effects ## ## Parameter | Coefficient ## --------------------------------------- ## SD (Intercept: clusters) | 0.11 ## SD (left_right: clusters) | 0.02 ## Cor (Intercept~clusters) | -1.00 ## SD (Residual) | 1.13 ``` ] --- ## btw: You can also plot MLM with `easystats` .pull-left[ ```r model_parameters(mlm) %>% plot() ``` ] .pull-right[ <img src="data:image/png;base64,#5_2_Outlook_files/figure-html/unnamed-chunk-2-1.png" width="95%" style="display: block; margin: auto;" /> ] --- ## ...or `sjPlot` for predictions .pull-left[ ```r library(sjPlot) plot_model(mlm, type = "pred") ``` ] .pull-right[ ``` ## $left_right ``` <img src="data:image/png;base64,#5_2_Outlook_files/figure-html/unnamed-chunk-3-1.png" width="95%" style="display: block; margin: auto;" /> ] --- ## Working with other data types Both Johannes and Stefan use different data types in their daily work: - digital trace data (Johannes) <sup>*</sup> - georeferenced/geospatial data (Stefan)<sup>**</sup> **Remember that `R` is data-agnostic! It can serve as a fancy data science tool for extracting social media data but also as a full-blown Geographic Information System (GIS)** .footnote[ <sup>*</sup> see, e.g., https://github.com/jobreu/twitter-linking-workshop-2021 <sup>**</sup> see, e.g., https://github.com/StefanJuenger/gesis-workshop-geospatial-techniques-R ] --- ## What Are Geospatial Data? .pull-left[ Data with a direct spatial reference `\(\rightarrow\)` **geo-coordinates** - Information about geometries - Optional: Content in relation to the geometries Can be projected jointly in one single space - Allows data linking and extraction of substantial information ] .pull-right[ <img src="data:image/png;base64,#C:\Users\mueller2\talks_presentations\r-intro-gesis-2021\content\img\fig_geometries.png" width="85%" style="display: block; margin: auto;" /> .tinyisher[Sources: OpenStreetMap / GEOFABRIK (2018), City of Cologne (2014), and the Statistical Offices of the Federation and the Länder (2016) / Jünger, 2019] ] --- ## Mapping is so easy nowadays .pull-left[ ```r library(mapsf) mtq <- mf_get_mtq() mf_map(x = mtq) mf_map(x = mtq, var = "POP", type = "prop") mf_layout( title = "Population in Martinique", credits = "T. Giraud; Sources: INSEE & IGN, 2018" ) ``` ] .pull-right[ <img src="data:image/png;base64,#5_2_Outlook_files/figure-html/mapsf-print-1.png" style="display: block; margin: auto;" /> ] Example from: https://riatelab.github.io/mapsf/ --- ## Interactive Mapping! .pull-left[ ```r library(mapview) mapview(mtq["POP"]) ``` ] .pull-right[ ] --- ## 'Web development' using `R` These days, a lot of `R` packages provides tool originally developed for the web. For example: - [bookdown](https://cran.r-project.org/web/packages/bookdown/index.html) enables you to publish your book written in `R Markdown` online - [pkgdown](https://cran.r-project.org/web/packages/pkgdown/index.html) does the same for your own `R` package - [blogdown](https://cran.r-project.org/web/packages/blogdown/index.html) is more general and helps you with creating websites (example to follow) --- ## Shiny apps > Shiny is an R package that makes it easy to build interactive web apps straight from R. You can host standalone apps on a webpage or embed them in R Markdown documents or build dashboards. You can also extend your Shiny apps with CSS themes, htmlwidgets, and JavaScript actions. https://shiny.rstudio.com/ --- class: middle ## Example 1: Movie Explorer .center[https://shiny.rstudio.com/gallery/movie-explorer.html] --- class: middle ## Example 2: CRAN explorer .center[https://gallery.shinyapps.io/cran-explorer/] --- ## Creating your own homepage with `R` .pull-left[ <img src="data:image/png;base64,#C:\Users\mueller2\talks_presentations\r-intro-gesis-2021\content\img\homepage_johannes.png" width="1319" style="display: block; margin: auto;" /> .center[.small[https://www.johannesbreuer.com/]] ] .pull-right[ <img src="data:image/png;base64,#C:\Users\mueller2\talks_presentations\r-intro-gesis-2021\content\img\homepage_stefan.png" width="1315" style="display: block; margin: auto;" /> .center[.small[https://stefanjuenger.github.io/]] ] .center[Powered by [`blogdown`](https://cran.r-project.org/web/packages/blogdown/index.html) &[ Hugo Academic](https://academic-demo.netlify.app/)] --- ## Writing your own `R` packages .pull-left[ At a certain point (not now!), you may want to consider writing your own `R` package - useful for creating reproducible code - great for distributing your work to others - for example, we created an [`R` package](https://stefanjuenger.github.io/woRkshoptools/) to facilitate working on our workshop materials ] .pull-right[ <img src="data:image/png;base64,#C:\Users\mueller2\talks_presentations\r-intro-gesis-2021\content\img\r_packages.jpg" width="1381" style="display: block; margin: auto;" /> [Read the book here!](https://r-pkgs.org/) ] --- class: middle ## It's straightforward in `RStudio` <img src="data:image/png;base64,#C:\Users\mueller2\talks_presentations\r-intro-gesis-2021\content\img\new_package.png" width="75%" style="display: block; margin: auto;" /> --- ## Acknowledgements ❤️ All slides were created with the `R` package [`xaringan`](https://github.com/yihui/xaringan) which builds on [`remark.js`](https://remarkjs.com), [`knitr`](http://yihui.name/knitr), and [`RMarkdown`](https://rmarkdown.rstudio.com). The exercises were created with the [`unilur` package](https://github.com/koncina/unilur). Please make sure to properly cite all data that you use for your research (archives usually provide suggested citations). Also make sure to cite the free and open-source software (FOSS) that you use, such as `R` and the packages for it. To know how to do that, you can use the function `citation(packagename)` in `R`. We thank the *GESIS Training* team for taking good care of the organization of this course (and the whole Summer School) and all of you for participating!